Goto

Collaborating Authors

 initial position



Appendix 1

Neural Information Processing Systems

Pi,jCi,j γH(P) subjectto P Rt t+,PT1t =1t,P1t =1t, (6) where Pi,j is the transport plan andCi,j is the ground metric that measures the distance between point i in the source andj in the target. This will induce some smoothness and wiggle room in the solutionofourobjective. To increase the diversity of the observed trajectories, we inject Gaussian noise (σ = 0.05) into trajectories by perturbing the initial velocities. Since two-body systems are non-chaotic systems, we divide training set and testing set such that for training set[mmin,mmax] = [0.8,1.2], while testing set[mmin,mmax] = [0.9,1.3] to create domain distribution shifting. The initial velocities of all bodies are based on their initial positions by rotating itby 90 andscalingitbyr1.5.


Dependent Reachable Sets for the Constant Bearing Pursuit Strategy

Makkapati, Venkata Ramana, Vechalapu, Tulasi Ram, Comandur, Vinodhini, Hutchinson, Seth

arXiv.org Artificial Intelligence

This paper introduces a novel reachability problem for the scenario where one agent follows another agent using the constant bearing pursuit strategy, and analyzes the geometry of the reachable set of the follower. Key theoretical results are derived, providing bounds for the associated dependent reachable set. Simulation results are presented to empirically establish the shape of the dependent reachable set. In the process, an original optimization problem for the constant bearing strategy is formulated and analyzed.


Concurrent-Allocation Task Execution for Multi-Robot Path-Crossing-Minimal Navigation in Obstacle Environments

Hu, Bin-Bin, Yao, Weijia, Zhou, Yanxin, Wei, Henglai, Lv, Chen

arXiv.org Artificial Intelligence

Reducing undesirable path crossings among trajectories of different robots is vital in multi-robot navigation missions, which not only reduces detours and conflict scenarios, but also enhances navigation efficiency and boosts productivity. Despite recent progress in multi-robot path-crossing-minimal (MPCM) navigation, the majority of approaches depend on the minimal squared-distance reassignment of suitable desired points to robots directly. However, if obstacles occupy the passing space, calculating the actual robot-point distances becomes complex or intractable, which may render the MPCM navigation in obstacle environments inefficient or even infeasible. In this paper, the concurrent-allocation task execution (CATE) algorithm is presented to address this problem (i.e., MPCM navigation in obstacle environments). First, the path-crossing-related elements in terms of (i) robot allocation, (ii) desired-point convergence, and (iii) collision and obstacle avoidance are encoded into integer and control barrier function (CBF) constraints. Then, the proposed constraints are used in an online constrained optimization framework, which implicitly yet effectively minimizes the possible path crossings and trajectory length in obstacle environments by minimizing the desired point allocation cost and slack variables in CBF constraints simultaneously. In this way, the MPCM navigation in obstacle environments can be achieved with flexible spatial orderings. Note that the feasibility of solutions and the asymptotic convergence property of the proposed CATE algorithm in obstacle environments are both guaranteed, and the calculation burden is also reduced by concurrently calculating the optimal allocation and the control input directly without the path planning process.


MOFM-Nav: On-Manifold Ordering-Flexible Multi-Robot Navigation

Hu, Bin-Bin, Yao, Weijia, Cao, Ming

arXiv.org Artificial Intelligence

This paper addresses the problem of multi-robot navigation where robots maneuver on a desired \(m\)-dimensional (i.e., \(m\)-D) manifold in the $n$-dimensional Euclidean space, and maintain a {\it flexible spatial ordering}. We consider $ m\geq 2$, and the multi-robot coordination is achieved via non-Euclidean metrics. However, since the $m$-D manifold can be characterized by the zero-level sets of $n$ implicit functions, the last $m$ entries of the GVF propagation term become {\it strongly coupled} with the partial derivatives of these functions if the auxiliary vectors are not appropriately chosen. These couplings not only influence the on-manifold maneuvering of robots, but also pose significant challenges to the further design of the ordering-flexible coordination via non-Euclidean metrics. To tackle this issue, we first identify a feasible solution of auxiliary vectors such that the last $m$ entries of the propagation term are effectively decoupled to be the same constant. Then, we redesign the coordinated GVF (CGVF) algorithm to {\it boost} the advantages of singularities elimination and global convergence by treating $m$ manifold parameters as additional $m$ virtual coordinates. Furthermore, we enable the on-manifold ordering-flexible motion coordination by allowing each robot to share $m$ virtual coordinates with its time-varying neighbors and a virtual target robot, which {\it circumvents} the possible complex calculation if Euclidean metrics were used instead. Finally, we showcase the proposed algorithm's flexibility, adaptability, and robustness through extensive simulations with different initial positions, higher-dimensional manifolds, and robot breakdown, respectively.


DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy

Wang, Yuran, Wu, Ruihai, Chen, Yue, Wang, Jiarui, Liang, Jiaqi, Zhu, Ziyu, Geng, Haoran, Malik, Jitendra, Abbeel, Pieter, Dong, Hao

arXiv.org Artificial Intelligence

Garment manipulation is a critical challenge due to the diversity in garment categories, geometries, and deformations. Despite this, humans can effortlessly handle garments, thanks to the dexterity of our hands. However, existing research in the field has struggled to replicate this level of dexterity, primarily hindered by the lack of realistic simulations of dexterous garment manipulation. Therefore, we propose DexGarmentLab, the first environment specifically designed for dexterous (especially bimanual) garment manipulation, which features large-scale high-quality 3D assets for 15 task scenarios, and refines simulation techniques tailored for garment modeling to reduce the sim-to-real gap. Previous data collection typically relies on teleoperation or training expert reinforcement learning (RL) policies, which are labor-intensive and inefficient. In this paper, we leverage garment structural correspondence to automatically generate a dataset with diverse trajectories using only a single expert demonstration, significantly reducing manual intervention. However, even extensive demonstrations cannot cover the infinite states of garments, which necessitates the exploration of new algorithms. To improve generalization across diverse garment shapes and deformations, we propose a Hierarchical gArment-manipuLation pOlicy (HALO). It first identifies transferable affordance points to accurately locate the manipulation area, then generates generalizable trajectories to complete the task. Through extensive experiments and detailed analysis of our method and baseline, we demonstrate that HALO consistently outperforms existing methods, successfully generalizing to previously unseen instances even with significant variations in shape and deformation where others fail. Our project page is available at: https://wayrise.github.io/DexGarmentLab/.


Appendix 1 Methods details

Neural Information Processing Systems

Two-body system initialization The trajectories are initialized in a near-circular way. Three-body system initialization For the chaotic three-body systems, we also apply initial condition regularization such that the initial trajectories of the system is also near-circular. Examples of the two-body systems and three-body systems we generated. We show a few examples of the two-body systems and three-body systems in Figure.A1. Similar as in Section.4.1, we show the latent space obtained from The recordings are collected from surgically implanted electrode arrays and are thresholded and spike sorted when collected.


DVDP: An End-to-End Policy for Mobile Robot Visual Docking with RGB-D Perception

Min, Haohan, Li, Zhoujian, Yang, Yu, Chen, Jinyu, Yuan, Shenghai

arXiv.org Artificial Intelligence

Automatic docking has long been a significant challenge in the field of mobile robotics. Compared to other automatic docking methods, visual docking methods offer higher precision and lower deployment costs, making them an efficient and promising choice for this task. However, visual docking methods impose strict requirements on the robot's initial position at the start of the docking process. To overcome the limitations of current vision-based methods, we propose an innovative end-to-end visual docking method named DVDP(direct visual docking policy). This approach requires only a binocular RGB-D camera installed on the mobile robot to directly output the robot's docking path, achieving end-to-end automatic docking. Furthermore, we have collected a large-scale dataset of mobile robot visual automatic docking dataset through a combination of virtual and real environments using the Unity 3D platform and actual mobile robot setups. We developed a series of evaluation metrics to quantify the performance of the end-to-end visual docking method. Extensive experiments, including benchmarks against leading perception backbones adapted into our framework, demonstrate that our method achieves superior performance. Finally, real-world deployment on the SCOUT Mini confirmed DVDP's efficacy, with our model generating smooth, feasible docking trajectories that meet physical constraints and reach the target pose.


Beyond Pairwise Comparisons: Unveiling Structural Landscape of Mobile Robot Models

Naito, Shota, Ninomiya, Tsukasa, Wada, Koichi

arXiv.org Artificial Intelligence

Understanding the computational power of mobile robot systems is a fundamental challenge in distributed computing. While prior work has focused on pairwise separations between models, we explore how robot capabilities, light observability, and scheduler synchrony interact in more complex ways. We first show that the Exponential Times Expansion (ETE) problem is solvable only in the strongest model -- fully-synchronous robots with full mutual lights ($\mathcal{LUMT}^F$). We then introduce the Hexagonal Edge Traversal (HET) and TAR(d)* problems to demonstrate how internal memory and lights interact with synchrony: under weak synchrony, internal memory alone is insufficient, while full synchrony can substitute for both lights and memory. In the asynchronous setting, we classify problems such as LP-MLCv, VEC, and ZCC to show fine-grained separations between $\mathcal{FSTA}$ and $\mathcal{FCOM}$ robots. We also analyze Vertex Traversal Rendezvous (VTR) and Leave Place Convergence (LP-Cv), illustrating the limitations of internal memory in symmetric settings. These results extend the known separation map of 14 canonical robot models, revealing structural phenomena only visible through higher-order comparisons. Our work provides new impossibility criteria and deepens the understanding of how observability, memory, and synchrony collectively shape the computational power of mobile robots.


Take That for Me: Multimodal Exophora Resolution with Interactive Questioning for Ambiguous Out-of-View Instructions

Oyama, Akira, Hasegawa, Shoichi, Taniguchi, Akira, Hagiwara, Yoshinobu, Taniguchi, Tadahiro

arXiv.org Artificial Intelligence

-- Daily life support robots must interpret ambiguous verbal instructions involving demonstratives such as "Bring me that cup," even when objects or users are out of the robot's view. Existing approaches to exophora resolution primarily rely on visual data and thus fail in real-world scenarios where the object or user is not visible. We propose Multimodal Interactive Exophora resolution with user Localization (MIEL), which is a multimodal exophora resolution framework leveraging sound source localization (SSL), semantic mapping, visual-language models (VLMs), and interactive questioning with GPT -4o. SSL is utilized to orient the robot toward users who are initially outside its visual field, enabling accurate identification of user gestures and pointing directions. When ambiguities remain, the robot proactively interacts with the user, employing GPT -4o to formulate clarifying questions. Experiments in a real-world environment showed results that were approximately 1.3 times better when the user was visible to the robot and 2.0 times better when the user was not visible to the robot, compared to the methods without SSL and interactive questioning. In our daily life, we frequently use verbal instructions that include demonstratives, such as "Take that for me," but for robots, the target object is often unclear and the user or object is often not in the robot's view. One of the challenges in the field of robotics is enabling daily life support robots to understand and execute tasks based on such instructions and situations [1]. To achieve this, implementing exophora resolution [2], [3] is essential. Exophora resolution involves identifying the referent --whether a person or object -- associated with anaphora (demonstratives or pronouns) within utterances, based on the surrounding context of the speaker or listener. For instance, if a user instructs the robot to "Bring me that cup," the robot must identify the target object corresponding to "that cup," even if there are many cups in the environment.